from IPython.core.display import HTML
# https://stackoverflow.com/questions/32156248/how-do-i-set-custom-css-for-my-ipython-ihaskell-jupyter-notebook
styles = open('./custom_css.css', "r").read()
s = '<style>%s</style>' % styles
HTML(s)
On the "Social data" tab, you will find a worksheet that reflects how one of our typical Twitter data pulls look like.
For this task, please just focus on the columns:
Task: Find the main problems that are being talked about on Twitter, on a city level and on a neighbourhood level.
On the "Dictionary" tab, we have shortlisted some problems, classified into main categories and sub-categories, each of them with keyword proxies. Use index match to find the count of posts that talk about the respective problems.
On the "Search data" tab, you will see the results of a typical Google search data pull.
Task: Create a clear visualisation that shows the different types of violence that are being searched about in the past year, and what the main discourse is around each type of violence.
- 1.1. social_data (Twitter data pulls)
On the "Social data" tab, you will find a worksheet that reflects how one of our typical Twitter data pulls look like.
| keyword | neighborhood | city | category | created_at | post_id | text | lang | translated_text | |
|---|---|---|---|---|---|---|---|---|---|
| 908 | Araby Cikány -has:links | Araby | Växjö | NaN | Fri Aug 17 08:55:41 +0000 2018 | 1.030378e+18 | @ChytraJako to je doufám řečnická otázka 😁 Nikdo nemůže dobrovolně chtít DVĚ tchyně na jednom místě 😶😁\nprosimtě co na nich konkrétně miluješ? Feťáky a bezdomovce po pas zabořený v popelnicích, araby nebo cikány? 😀 | cs | @ChytraHow is this a hopeful rhetorical question 😁 No one can voluntarily want TWO mother-in-law in one place 😶😁 please what do you specifically love about them? Junkies and homeless passports buried in bins, Arabs or gypsies? 😀 |
For this task, please just focus on the columns: input keyword to extract tweets, location of tweets, actual tweets
- 1.2. dictionary (keyword for tweets)
On the "Dictionary" tab, we have shortlisted some problems, classified into main categories and sub-categories, each of them with keyword proxies.
| CATEGORY | SUB CATEGORY | KEYWORD |
|---|---|---|
| Crime | problem area | Vulnerable area |
| Neglect | poverty | impoverished |
| Xenophobia | Islamophobia | kebab seller |
| Crime | gun violence | loud bangs |
| Parking | parking | dense |
| Crime | rape | sexual assault |
- 2.1. search_data (Google search data pull)
On the "Search data" tab, you will see the results of a typical Google search data pull.
| id | Search KWs | Type of violence | Discourse theme | Search Volume m/yyyy |
|---|---|---|---|---|
| ... | abuses of internet | Psychological | Online harassment | 1600.0 |
| 98 | spouse abuse | Physical | marital violence | 4400.0 |
| ... | how to expose wife | Psychological | Doxxing | 0.0 |
| ... | harassment at work | Psychological | Workplace harassment | 6600.0 |
| ... | loss of masculinity | Physical | Violence due to social rejection | 0.0 |
| ... | face acid | Physical | Acid violence | 8100.0 |
| 115 | wife made to have sex | Sexual | marital rape | NaN |
| 55 | revenge porn | Psychological | Violence due to social rejection | 246000.0 |
| 100 | domestic violence against women | Physical | marital violence | 2400.0 |
Let's import the modules we need to process the data easily
import sys; sys.path.insert(0, '..')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
%matplotlib inline
import missingno
import folium
import cartopy
from wordcloud import WordCloud
from nltk import TweetTokenizer
import warnings; warnings.filterwarnings('ignore')
import src.analytical_processing as analytical_processing
import src.visualisations as visuals_prepared
import importlib; importlib.reload(analytical_processing); importlib.reload(visuals_prepared);
Let's load (read) the data we have
# main datasets
social_data = pd.read_excel('../data/raw/Data Analysis and Visualization Assignment.xlsx',
sheet_name='Social data')
dictionary = pd.read_excel('../data/raw/Data Analysis and Visualization Assignment.xlsx',
sheet_name='Dictionary ')
search_data = pd.read_excel('../data/raw/Data Analysis and Visualization Assignment.xlsx',
sheet_name='Search data')
# auxilary datasets
cities_geo_coords = pd.read_csv('../data/external/geocoded_by_geoapify-7_6_2022, 10_26_18 PM.csv')
social_data.head(2)
dictionary.head(2)
search_data.head(2)
Let's pre-process datasets we have
%time
social_data['translated_text'] = social_data.translated_text.str.replace(''', "\'")
# change only that `@ ` where exist is in translated and not exist in original text
mask = social_data.text.str.contains('@ ', na=False) & social_data.translated_text.str.contains('@ ', na=False)
social_data['translated_text'] = social_data.translated_text.where(cond=mask, # self = mask=True
other=lambda x : x.str.replace('@ ', "@")
)
# NANs original and in translated texts
social_data = social_data[~social_data.text.isna()]
social_data = social_data[~social_data.translated_text.isna()]
$\star$ Data is about Sweden (Sverige). Here are two maps to imagine what and where Sweden is. Originally, many of a text data were in the Swedish language
| Map of Sweden | |
|---|---|
![]() |
![]() |
We've got $13$ cities with a respective number (#) of neighborhoods listed below. Worth to mention that a definition of "neighborhood" is pretty broad here.
City neighborhoods #, Borås Hässleholmen, Sjöbo 2 Eskilstuna Öster, Fröslunda, Stenby, Råbergstorp, Myrtorp, Norr Eskilstuna, Övre Nyfors, Snopptorp, Söder Eskilstuna 9 Göteborg Bergsjön, Angered, Gårdsten, Kållered, Lindome 5 Haninge Brandbergen 1 Karlskrona Cottage, Rödeby, Björkhaga, Jämjö, Trossö, Galgamarken, Pantarholmen 7 Kristianstad Östermalm, Centrum, Udden, Parkstaden, Öllsjö 5 Linköping Skäggetorp, Ryd 2 Malmö Rosengård, Innerstaden, Hermodsdal, Lindängen, Heleneholm 5 Markaryd Markaryd, Strömsnäsbruk, Timsfors 3 Nyköping Brandkärr, Fågelbo 2 Tensta Tensta 1 Växjö Biskopsgården, Araby 2 Örebro Brickebacken, Varberga 2
Let's plot an overall distribution of un-categorized (raw) tweeted texts by every city we have.
left_aux = social_data.groupby('city', as_index=False).translated_text.count()
right_aux = cities_geo_coords[['original_City', 'lat', 'lon']]
city_count_aux = pd.merge(left_aux, right_aux, left_on='city', right_on='original_City')
del left_aux, right_aux #, cities_geo_coords
map_object = visuals_prepared.plot_folium_map_with_circles(df_to_plot=city_count_aux, save=True)
map_object.save('../figures/Map_all_texts.html')
map_object
Source: [1]
Then, let's plot the same distribution but with an account for a slice of "neighborhoods".
df_to_plot = social_data.groupby(['city', 'neighborhood']).translated_text.count().unstack('neighborhood')
fig, axes = plt.subplots(1, 1, figsize = (12, 8), dpi=200);
df_to_plot.plot.bar(stacked=True, ax=axes, zorder=2)
axes.set_title(label='Number of Texts by City and Neighborhood',
loc='Left', fontsize=16, fontweight='bold')
axes.get_legend().remove()
axes.spines[['top', 'right']].set_visible(False)
axes.grid(which='major', axis='y', linestyle='--', alpha=0.8, zorder=1)
fig.tight_layout(rect=[0, 0.00, 1.5, 1.])
What did we find regarding locations?
On one hand, every location we saw above is described, primary, by the vector tweets and related technical features - keywords, DateTime)
%%time
fig, axes = plt.subplots(3, 5, figsize=(12, 6), dpi=250);
axes = axes.flatten()
visuals_prepared.plot_wordcloud(social_data[~social_data.translated_text.isna()].translated_text,
tokenizer=TweetTokenizer(preserve_case=False), axis=axes[0])
axes[0].set_title("All locations" + f", n={social_data.shape[0]}", loc='left')
for indx, city in enumerate(social_data.city.unique(), start=1):
df_to_plot = social_data[social_data.city == city].translated_text[0:]
visuals_prepared.plot_wordcloud(df_to_plot, tokenizer=TweetTokenizer(preserve_case=False), axis=axes[indx])
title = f"{city}, n={df_to_plot.shape[0]}"
axes[indx].set_title(title, loc='left' ) #, fontweight='bold'
#break
fig.delaxes(axes[-1]);
fig.tight_layout();
On other hand, we've got a dictionary with a categorization of concerns and their respective keywords.
transgender, transsexual, transvestite might be used to describe the same phenomena by people of different life experience
CATEGORY # KWords SUB CATEGORY Crime 91 problem area, gun violence, drugs, rape, Fear of Safety, gangs, robbery & vandalism, battery and assault Neglect 24 poverty, housing conditions, trash Parking 14 parking Rental landscape 25 rising rent, housing availability Xenophobia 26 Islamophobia, Xenophobia
df_to_plot = dictionary.groupby(['CATEGORY', 'SUB CATEGORY']).KEYWORD.count().unstack('SUB CATEGORY')
fig, axes = plt.subplots(1, 1, figsize = (12, 8), dpi=200);
df_to_plot.plot.bar(stacked=True, ax=axes, zorder=2)
for container in axes.containers:
axes.bar_label(container, padding=2, )
axes.set_title(label='Number of Keywords by CATEGORY and SUB CATEGORY',
loc='Left', fontsize=16, fontweight='bold')
axes.get_legend().remove()
axes.spines[['top', 'right']].set_visible(False)
axes.grid(which='major', axis='y', linestyle='--', alpha=0.8, zorder=1)
fig.tight_layout(rect=[0, 0.00, 1.5, 1.])
What did we find in a dictionary dataset?
First task:
Second task,
Task: Find the main problems that are being talked about on Twitter, on a city level and on a neighbourhood level.
Solution explanaition:
- social_data
social_data.sample(3)
- dictionary
dictionary.sample(2)
Let's run the classification ("categorization") pipeline
%time
# let's prepare regexprs string
keywords_grouped = dictionary.groupby(['CATEGORY', 'SUB CATEGORY']).KEYWORD.unique()
keyword_strings = analytical_processing._prepare_keyword_sring(keywords_grouped)
# let's run keywords search
df_aux = social_data[['keyword', 'neighborhood', 'city', 'lang', 'translated_text', 'category']]
social_data_categorized_keywords = analytical_processing.naive_keyword_search(
df=df_aux,
keywords_str=keyword_strings)
social_data_categorized_keywords.head(2)
Let's add additional columns to see how many problem sub categories were detected in each text
Crime_Fear of Safety and Crime_drugs subcategories would have $2$ at Crime category in generaldf_aux = social_data_categorized_keywords.iloc[:, :5]
df_aux = pd.concat([df_aux, ~social_data_categorized_keywords.iloc[:, 5:].isna()], axis=1)
columns = social_data_categorized_keywords.columns
for cat in dictionary.CATEGORY.unique():
# how many crime categories were detected
target_columns = columns[columns.str.contains(f'{cat}*')]
df_aux[cat] = df_aux[target_columns].sum(1)
df_aux.head(3)
From the "methodological" side
From the "analytical" side
From the "technological" side
After classification we see the next results:
tweet 908 - @ChytraHow is this a hopeful rhetorical question 😁 No one can voluntarily want TWO mother-in-law in one place 😶😁 please what do you specifically love about them? Junkies and homeless passports buried in bins, Arabs or gypsies? 😀 (4 categories)tweet 1232 - @mathieuvonrohr @Andromake000 This is how blacks in gangs behave in Sweden too! Immigrants-youth gangs-rape gangs etc! #svpol #migpol They go in hordes, burn cars like drug cartels (Somalis) Biskopsgården, Rinkeby! I HATE THEM! Rapper = #violence #booba #kaari's real assholes! (4 keywords from Crime category)Category (& Subcategory) to be binary, not 'more' or 'less' important because of a number of keywordsNext possible steps:
keyword column deeper (i.e. Östermalm (flyktingar OR asylsökande) -has:links)Some texts had more than one keywords of category (overlapping categories), some had none (not-categorized). Let's see the distribution of categorization
axes = missingno.matrix(df_aux.iloc[:, -5:].replace(0, np.nan), sparkline=True)
axes.set_title('Texts by Category keywords founded', x=0.16, fontsize=20);
Sometimes, as said above, texts had more than one target keyword.
Crime category prevails on this side, there were about $700$ texts with more than one keywordfig, axes = plt.subplots(1, 5, figsize=(16, 4), dpi=250, sharey=True)
for indx, category in enumerate(df_aux.columns[-5:]):
df_aux[category].value_counts()[1:].plot.bar(ax=axes[indx], zorder=100)
axes[indx].set_xlim(-1, 5)
axes[indx].set_xticks(range(-1, 5), )
axes[indx].set_xticklabels([''] + list(range(1, 6)))
axes[indx].set_title(category, loc='left', rotation=0, y=1.05)
axes[indx].grid(axis='y', alpha=0.4, linestyle='-', lw=0.5)
for container in axes[indx].containers:
axes[indx].bar_label(container, padding=1)
axes[indx].spines[["top", "right", "left", "bottom"]].set_visible(False)
#axes[indx].get_yaxis().set_visible(False)
fig.suptitle('Count of keywords in texts by Problem category', x=0.155, y=1.0,
fontweight='bold', fontsize=12)
fig.tight_layout(rect=[0, 0.00, 0.85, 1.]);
So, we "categorized" our texts. Let's see what we get
columns_mask = df_aux.columns[0:5].append(df_aux.columns[-5:])
df_aux_tmp = df_aux[columns_mask]
df_aux_tmp.loc[:, -5:] = df_aux_tmp.iloc[:, -5:] >= 1 # hide if need all counts not unique
df_categories_count = df_aux_tmp.groupby(['city']).sum()
df_categories_count = pd.merge(left=city_count_aux, right=df_categories_count,
left_on='city', right_index=True)
df_categories_count.iloc[:, -5:] = df_categories_count.iloc[:, -5:]\
.div(df_categories_count.translated_text, axis=0)
df_categories_count = df_categories_count.set_index('city')
df_categories_count
fig, axes = plt.subplots(4, 4, dpi=250)
axes = axes.flatten()
colors = None #['red', 'blue', 'gray', 'yellow', 'black'] # '#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd'
for indx, (city, row) in enumerate(df_categories_count.iterrows()):
# print(city, row)
axes[indx].pie(x=row[-5:], colors=colors, normalize=False) # radius=indx * 0.3
axes[indx].set_title(city + f', n={row[0]}', size=10)
fig.delaxes(axes[-3]); fig.delaxes(axes[-2]); fig.delaxes(axes[-1])
fig.legend(df_categories_count.columns[-5:], loc='upper left', ncol=len(df_categories_count.columns[-5:]),
bbox_to_anchor=(0.005, 1.02), frameon=False,
columnspacing=1., handletextpad=-2)
fig.suptitle('Shares of categorized tweets by City', x=0.28, y=1.05)
fig.tight_layout()
Source: [1] - legend's idea
Now, let's see the same shares at a country map
projection = cartopy.crs.PlateCarree()
fig, axes = plt.subplots(1, 1, figsize=(12, 8), dpi=250,
subplot_kw={'projection': projection})
colors = ['red', 'blue', 'gray', 'yellow', 'black']
for indx, (city, row) in enumerate(df_categories_count.iterrows()):
# print(city, row)
visuals_prepared._draw_pie_marker(xs=row['lon'],
ys = row['lat'],
ratios=row[-5:].to_list(),
sizes=row[0] * 0.75, # /2 # *0.8
colors=colors,
ax=axes)
visuals_prepared._add_geo_basemap(extents=[3.6435, 25.6435, 53.1282, 65.3508],
projection=projection, ax=axes)
patches_aux = zip(colors, df_categories_count.columns[-5:])
patches = [mpatches.Patch(color=patch[0], label=patch[1]) for patch in patches_aux]
axes.legend(handles = patches, loc='upper right', ncol=5, frameon=False, bbox_to_anchor=(1.01, 1.055),
columnspacing=1., handletextpad=0.1)
#fig.legend(df_categories_count.columns[-5:], loc='upper right',
# ncol=5, frameon=False, bbox_to_anchor=(0.84, 0.86))
fig.suptitle('Cities & Categories', x=0.09, y=0.85, fontweight='bold')
fig.tight_layout(rect=[0, 0.00, 0.85, 1.]);
Source: see visuals_prepared._draw_pie_marker
We saw shares of problems in joint views. What if we try to "separate" views?
projection = cartopy.crs.PlateCarree()
fig, axes = plt.subplots(3, 2, figsize=(16, 10), dpi=250,
subplot_kw={'projection': projection}
)
axes = axes.flatten()
# axes[0] -- at leas one category exist
tmp_ = df_aux_tmp.loc[df_aux_tmp.iloc[:, -5:].sum(1) > 1].city.value_counts()
df_to_plot = df_categories_count.reset_index()
df_to_plot['> 1 cat'] = city_count_aux.city.map(tmp_) / city_count_aux.translated_text
for indx, column in enumerate([df_to_plot.columns[-1]] + df_to_plot.columns[-6:-1].to_list(),
start=0):
axes[indx].scatter(x=df_to_plot['lon'],
y=df_to_plot['lat'],
s=df_to_plot[column] * 700,
alpha=0.4, #facecolors='none', edgecolor='blue'
)
axes[indx].set_xticklabels(''); axes[indx].set_yticklabels('')
axes[indx].set_title(column + ', % of City', loc='left')
visuals_prepared._add_geo_basemap(extents=[3.6435, 25.6435, 53.1282, 65.3508],
projection=projection, ax=axes[indx])
fig.suptitle('Categories of problems', x=0.15, fontweight='bold')
fig.tight_layout(rect=[0, 0.00, 0.85, 1.]);
del tmp_, df_to_plot;
We saw the distributions of problems above.
Let's see what vocabulary people used to explain the problems
%%time
fig, axes = plt.subplots(2, 3, figsize=(12, 8), dpi=250);
axes = axes.flatten()
df_to_plot = df_aux_tmp[df_aux_tmp.columns[0:5].append(df_aux_tmp.columns[-5:])]
visuals_prepared.plot_wordcloud(
df_to_plot[df_to_plot.sum(1, numeric_only=True) > 1].translated_text,
tokenizer = TweetTokenizer(preserve_case=False),
axis=axes[0])
axes[0].set_title('More than one category', loc='Left', fontweight='bold', fontsize=16)
for indx, category in enumerate(df_to_plot.columns[5:], start=1):
visuals_prepared.plot_wordcloud(
df_to_plot[df_to_plot[category] == 1].translated_text,
tokenizer = TweetTokenizer(preserve_case=False),
axis = axes[indx])
axes[indx].set_title(category, loc='Left', fontweight='bold', fontsize=16)
fig.tight_layout(rect=[0, 0.00, 1.5, 1.2]);
It's harder to present then to get. Some ideas are:
social_data_categorized_keywords.sample(3)
Task: Create a clear visualisation that shows the different types of violence that are being searched about in the past year, and what the main discourse is around each type of violence.
- Which ones have the highest search volume?
- Which ones have the highest growth in searches?
Solution explanaition:
We will show
growth as from $x_i$ to $x_{i-1}$We could, if there were a way to interview target clients, to ask
- search_data
| id | Search KWs | Type of violence | Discourse theme | Search Volume m/yyyy | Search Volume m+1/yyyy |
|---|---|---|---|---|---|
| 98 | spouse abuse | Physical | marital violence | 4400.0 | ... |
| 115 | wife made to have sex | Sexual | marital rape | NaN | ... |
| 55 | revenge porn | Psychological | Violence due to social rejection | 246000.0 | ... |
| 100 | domestic violence against women | Physical | marital violence | 2400.0 | ... |
search_data.sample(3)
Notes:
38 | महिला अधिकार संरक्षण)Aggregate categoryLet's observe keywords we have
fig, axes = plt.subplots(1, 4, figsize=(12, 3), dpi=300);
axes = axes.flatten()
df_aux = search_data.groupby(['Type of violence'])['Search KWs '].unique()
for indx, type_violence in enumerate(df_aux.index):
df_to_plot = pd.Series(df_aux[df_aux.index == type_violence].values[0])
visuals_prepared.plot_wordcloud(df_to_plot,
tokenizer=TweetTokenizer(preserve_case=False),
axis=axes[indx])
title = f"{type_violence}, n={df_to_plot.shape[0]}"
axes[indx].set_title(title, loc='left') #, fontweight='bold'
#break
fig.suptitle('Search KWs by Type of violence', x=0.135, fontweight='bold')
fig.tight_layout()
Let's see if any keywords had missing values for any period(s)
ax = missingno.matrix(search_data, sparkline=False)
plt.tight_layout()
No idea how to make it more elegant right now
df_top_rows = pd.DataFrame(index=['1/2020', '2/2020', '3/2020', '4/2020']) # add aux rows
df_to_plot_yoy = search_data.groupby(['Type of violence']).sum().T
df_to_plot_yoy.index = df_to_plot_yoy.index.str.replace('Search Volume ', '')
df_to_plot_yoy = pd.concat([df_top_rows, df_to_plot_yoy], ignore_index=False) # add aux rows
df_to_plot_yoy['Month'] = df_to_plot_yoy.index.str.extract('(\d{1,2})/.*', expand=False)
df_to_plot_yoy['Year'] = df_to_plot_yoy.index.str.extract('.*/(\d{4})', expand=False)
df_to_plot_yoy['Total'] = df_to_plot_yoy.iloc[4:, 1:-2].sum(1)
del df_top_rows
df_to_plot_ytd = search_data.groupby(['Type of violence', 'Discourse theme']).sum().T
df_to_plot_ytd.index = df_to_plot_ytd.index.str.replace('Search Volume ', '')
df_to_plot_ytd['Month'] = df_to_plot_ytd.index.str.extract('(\d{1,2})/.*', expand=False)
df_to_plot_ytd['Year'] = df_to_plot_ytd.index.str.extract('.*/(\d{4})', expand=False)
Firstly, let's see and compare trends and values in this dataset year over year.
# prepare grid
fig = plt.figure(figsize=(14, 8), dpi=250)
gs = fig.add_gridspec(nrows=6, ncols=2)
#gs.update(vspace= -0.55)
ax0 = fig.add_subplot(gs[:2, 0:])
ax1 = fig.add_subplot(gs[2:4, 0])
ax2 = fig.add_subplot(gs[2:4, 1], sharex=ax1)
ax3 = fig.add_subplot(gs[4:, 0], sharex=ax1)
ax4 = fig.add_subplot(gs[4:, 1], sharex=ax1)
axes=[ax0, ax1, ax2, ax3, ax4]
columns = ['Total', 'Aggregate', 'Psychological', 'Physical', 'Sexual']
visuals_prepared.plot_lines_yoy(df_to_plot=df_to_plot_yoy, axes=axes, columns=columns)
# main title & legend
fig.suptitle('YOY Search Volumes. Total & by Violence type', x=0.165, y=1.02,
fontweight='bold', fontsize=12)
fig.legend([2020, 2021, 2022], loc='upper left', bbox_to_anchor=(0.01, 1.0),
frameon=False, title=None, ncol=3)
fig.tight_layout();
And the same slice, but from a point of growth (defined as from $x_{i+1} \text{ to } x_{i}$)
# prepare grid
fig = plt.figure(figsize=(14, 8), dpi=250)
gs = fig.add_gridspec(nrows=6, ncols=2)
#gs.update(vspace= -0.55)
ax0 = fig.add_subplot(gs[:2, 0:])
ax1 = fig.add_subplot(gs[2:4, 0])
ax2 = fig.add_subplot(gs[2:4, 1], sharex=ax1)
ax3 = fig.add_subplot(gs[4:, 0], sharex=ax1)
ax4 = fig.add_subplot(gs[4:, 1], sharex=ax1)
axes=[ax0, ax1, ax2, ax3, ax4]
columns = ['Total', 'Aggregate', 'Psychological', 'Physical', 'Sexual']
df_to_plot_growth = pd.concat([df_to_plot_yoy[df_to_plot_yoy.columns[0:4].to_list() + ['Total']].pct_change(1),
df_to_plot_yoy[['Month', 'Year']]], axis=1)
visuals_prepared.plot_lines_yoy(df_to_plot=df_to_plot_growth, axes=axes, columns=columns)
# main title & legend
fig.suptitle('YOY Search Volumes Growth. Total & by Violence type', x=0.2, y=1.02,
fontweight='bold', fontsize=12)
fig.legend([2020, 2021, 2022], loc='upper left', bbox_to_anchor=(0.01, 1.0),
frameon=False, title=None, ncol=3)
fig.tight_layout();
Next, let's see trends and values in search volumes dataset year to last date present in a dataset.
fig, axes = plt.subplots(2, 2, figsize=(14, 12), dpi=250,
sharey=False, sharex=False);
axes = axes.flatten()
for indx, type_violence in enumerate(columns[1:]):
df_to_plot_ytd[type_violence].plot.line(ax=axes[indx], style='--', marker='.')
#np.log(df_to_plot).plot.line(figsize=(12, 8), ax=axes[indx]) # np.log1p
axes[indx].grid(axis='y', alpha=0.5, linestyle='-', color='gray')
axes[indx].grid(axis='x', alpha=0.2, linestyle='-.')
axes[indx].set_title(type_violence, loc='left')
axes[indx].legend(frameon=False) # ncol=2
if (indx + 1) % 2 == 0:
axes[indx].yaxis.tick_right()
axes[indx].spines[["top", "right"]].set_visible(False)
else:
axes[indx].spines[["top", "left"]].set_visible(False)
# main title & legend
fig.suptitle('YTD Search Volumes by Violence type and Discourse theme', x=0.25, y=1.0,
fontweight='bold', fontsize=12)
fig.tight_layout();